AITopics

2510.15569

Country: Asia (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.72)

Park, Jaihyun, Cordell, Ryan

A Quantitative Discourse Analysis of Asian Workers in the US Historical Newspapers

arXiv.org Artificial IntelligenceFeb-4-2024

Warning: This paper contains examples of offensive language targetting marginalized population. The digitization of historical texts invites researchers to explore the large-scale corpus of historical texts with computational methods. In this study, we present computational text analysis on a relatively understudied topic of how Asian workers are represented in historical newspapers in the United States. We found that the word "coolie" was semantically different in some States (e.g., Massachusetts, Rhode Island, Wyoming, Oklahoma, and Arkansas) with the different discourses around coolie. We also found that then-Confederate newspapers and then-Union newspapers formed distinctive discourses by measuring over-represented words. Newspapers from then-Confederate States associated coolie with slavery-related words. In addition, we found Asians were perceived to be inferior to European immigrants and subjected to the target of racism. This study contributes to supplementing the qualitative analysis of racism in the United States with quantitative discourse analysis.

coolie, newspaper, united states, (15 more...)

2402.02572

Country:

North America > United States > Massachusetts (0.26)
North America > United States > Rhode Island (0.26)
North America > United States > Arkansas (0.25)
(50 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Media > News (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.70)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

arXiv.org Artificial IntelligenceJul-26-2023

The flow of ideas in word embeddings

Dasgupta, Debayan

The flow of ideas has been extensively studied by physicists, psychologists, and machine learning engineers. This paper adopts specific tools from microrheology to investigate the similarity-based flow of ideas. We introduce a random walker in word embeddings and study its behavior. Such similarity-mediated random walks through the embedding space show signatures of anomalous diffusion commonly observed in complex structured systems such as biological cells and complex fluids. The paper concludes by proposing the application of popular tools employed in the study of random walks and diffusion of particles under Brownian motion to assess quantitatively the incorporation of diverse ideas in a document. Overall, this paper presents a self-referenced method combining microrheology and machine learning concepts to explore the meandering tendencies of language models and their potential association with creativity.

artificial intelligence, machine learning, natural language, (20 more...)

2307.16819

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.96)

#artificialintelligenceJan-4-2023, 15:15:48 GMT

New Developments in the field of Computation and Language part2(Artificial Intelligence)

Abstract: We present the first openly available multi- modal metaphor annotated corpus. The corpus consists of videos including audio and subti- tles that have been annotated by experts. Fur- thermore, we present a method for detecting metaphors in the new dataset based on the textual content of the videos. The method achieves a high F1-score (62%) for metaphor- ical labels. We also experiment with other modalities and multimodal methods; however, these methods did not out-perform the text- based model.

artificial intelligence, machine learning, natural language, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.36)

arXiv.org Artificial IntelligenceSep-29-2022

Synonym Detection Using Syntactic Dependency And Neural Embeddings

Yang, Dongqiang, Wang, Pikun, Sun, Xiaodong, Li, Ning

Recent advances on the Vector Space Model have significantly improved some NLP applications such as neural machine translation and natural language generation. Although word co-occurrences in context have been widely used in counting-/predicting-based distributional models, the role of syntactic dependencies in deriving distributional semantics has not yet been thoroughly investigated. By comparing various Vector Space Models in detecting synonyms in TOEFL, we systematically study the salience of syntactic dependencies in accounting for distributional similarity. We separate syntactic dependencies into different groups according to their various grammatical roles and then use context-counting to construct their corresponding raw and SVD-compressed matrices. Moreover, using the same training hyperparameters and corpora, we study typical neural embeddings in the evaluation. We further study the effectiveness of injecting human-compiled semantic knowledge into neural embeddings on computing distributional similarity. Our results show that the syntactically conditioned contexts can interpret lexical semantics better than the unconditioned ones, whereas retrofitting neural embeddings with semantic knowledge can significantly improve synonym detection.

artificial intelligence, natural language, text processing, (19 more...)

2209.15202

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(26 more...)

Genre: Research Report > New Finding (0.54)

Industry:

Banking & Finance > Economy (0.47)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.88)

Mostafavi, Moeen, Varnosfaderani, Mahsa Pahlavikhah, Nikseresht, Fateme, Mansouri, Seyed Ahmad

emojiSpace: Spatial Representation of Emojis

arXiv.org Artificial IntelligenceSep-12-2022

In the absence of nonverbal cues during messaging communication, users express part of their emotions using emojis. Thus, having emojis in the vocabulary of text messaging language models can significantly improve many natural language processing (NLP) applications such as online communication analysis. On the other hand, word embedding models are usually trained on a very large corpus of text such as Wikipedia or Google News datasets that include very few samples with emojis. In this study, we create emojiSpace, which is a combined word-emoji embedding using the word2vec model from the Genism library in Python. We trained emojiSpace on a corpus of more than 4 billion tweets and evaluated it by implementing sentiment analysis on a Twitter dataset containing more than 67 million tweets as an extrinsic task. For this task, we compared the performance of two different classifiers of random forest (RF) and linear support vector machine (SVM). For evaluation, we compared emojiSpace performance with two other pre-trained embeddings and demonstrated that emojiSpace outperforms both.

artificial intelligence, machine learning, natural language, (19 more...)

2209.09871

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Virginia (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Slovenia (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.86)

#artificialintelligenceJan-14-2022, 16:35:40 GMT

Word2vec with PyTorch: Implementing the Original Paper

Word Embeddings is the most fundamental concept in Deep Natural Language Processing. And word2vec is one of the earliest algorithms used to train word embeddings. In this post, I want to go deeper into the first paper on word2vec -- Efficient Estimation of Word Representations in Vector Space (2013), which as of now has 24k citations, and this number is still growing. I am attaching my Github project with word2vec training. We will go through it in this post.

dataset, skip-gram model, vector, (15 more...)

Country:

Europe > Germany (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

#artificialintelligenceNov-9-2021, 12:15:25 GMT

Autocorrect Feature using NLP in Python

This article was published as a part of the Data Science Blogathon. Natural Language Processing (NLP) is the field of artificial intelligence that relates lingual to Computer Science. I am assuming that you have understood the basic concepts of NLP. So we will move ahead. Have you ever wondered about how the Autocorrect features work on the keyboard of a Smartphone?

autocorrect feature, python, smartphone, (13 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

#artificialintelligenceOct-31-2021, 21:55:37 GMT

Word2vec with PyTorch: Implementing the Original Paper

dataset, skip-gram model, vector, (15 more...)

Country:

Europe > Germany (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

#artificialintelligenceNov-30-2020, 21:19:25 GMT

Word Embeddings in High-Level

The most common representation of words in NLP tasks is the One Hot Encoding. Below we can see an example of One Hot Encoding for the words "Cat" and "Dog". As we can see, these two vectors are independent since their inner product is 0, and their Euclidean distance is the square root of 2. Notice that this applies to every pair in the vocabulary, meaning that every pair of words are independent, and their distance is the square root of 2. Notice that this applies to every pair in the vocabulary, meaning that every pair of words are independent, and their distance is \(\sqrt(2)\). For example, the words below are considered independent, and the distance -- similarity between any pair of words is the same. This is an issue for NLP tasks since we want to be able to capture the relation between words.

dimension, similar word, word embedding, (13 more...)

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.05)
Europe > Greece (0.05)
Europe > France (0.05)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.33)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.31)